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Abstract* 

Polarimetric  Synthetic  Aperture  Radar  (POLSAR) 
Images  have  great  potential  for  land-use  management, 
provided  the  images  can  be  efficiently  segmented. 
This  paper  describes  the  application  of  the  robust 
competitive  agglomeration  (RCA)  clustering  algorithm 
to  POLSAR  images  to  segment  the  images.  Examples 
are  presented  and  future  efforts  are  discussed. 

1.  Introduction 

J.S.  Lee  [1-2]  has  applied  both  hard  c-means  clustering 
(HCM)  and  fuzzy  c-means  clustering  (FCM)  to 
Synthetic  Aperture  Radar  (SAR)  images.  Verdi  et.  al. 
[3]  has  also  studied  this  approach  on  polarimetric  high 
resolution  tri-band  SAR  data  and  shown  segmentation 
results  for  both  the  HCM  and  the  FCM.  The  robust 
fuzzy  c-means  (RFCM)  clustering  algorithm  has  been 
applied  to  POLSAR  images  and  also  produced 
encouraging  segmentation  results  [4].  This  paper 
applies  a  modified  version  of  the  robust  competitive 
agglomeration  (RCA)  clustering  algorithm  [5]  to 
segment  the  POLSAR  images.  The  RCA  also  provides 
an  estimate  for  the  number  of  clusters  in  the  image.  In 
section  2,  POLSAR  images  are  briefly  described  and 
previous  publications  in  this  area  discussed.  In  section 
3,  a  brief  discussion  of  the  applied  version  of  the  RCA 
is  given.  Section  4  gives  some  examples  and  section  5 
contains  the  conclusions. 

2.  Polarimetric  SAR  Images 

Polarimetric  SAR  images  can  be  constructed  from  the 
complex  scattering  returns  from  the  four  possible  polar 
combinations  of  transmit-receive  returns  of  the  radar: 
HH ,  HV ,  VH ,  and  VV  .  Because  of  symmetry 
assumptions,  HV  and  VH  returns  are  identical 
yielding  a  3-D  complex  scattering  vector  for  each  pixel 
in  the  image  lattice.  An  incredible  amount  of 
preprocessing  is  required  to  form,  register,  and 
calibrate  the  image.  The  only  feature  used  in  this  paper 
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is  the  Coherence  matrix,  which  is  a  Hermitian  matrix 
defined  as  the  outer  product  of  three  linear 
combinations  of  the  complex  scattering  vectors: 
HH  +  VV ,  HV ,  and  HH  -  VV  .  A  real  vector  of 
dimension  9  is  then  constructed  from  the  lower 
triangular  part  of  this  matrix.  This  feature  vector,  that 
is  associated  with  each  pixel  of  the  image  lattice,  is 
used  for  clustering  and  classification.  The  dynamic 
range  of  this  feature  vector  may  be  large  and  outliers 
are  a  frequent  occurrence. 

Du  and  Lee  [2]  applied  FCM  to  segment  SAR  images 
using  a  distance  measure  based  on  Wishart  measure, 
which  replaced  the  usual  Euclidean  squared  distance 

dfk  in  the  FCM  objective  function.  Similarly,  RFCM 

replaces  dfk  with  Huber's  p  function.  Both 
approaches  reduce  the  influence  of  outliers  by 
replacing  d\  with  a  slowly  increasing  function  of 
distance. 

3.  Segmentation  via  Robust  Clustering 
Clustering  is  often  used  to  segment  images  since 
segmentation  is  really  pattern  recognition,  i.e. 
classifying  each  pixel  [6-7].  After  the  pixel  feature 
vectors  are  clustered  into  c  distinct  classes,  labeling 
each  pixel  with  the  exemplar  closest  to  it  segments  an 
image.  The  clustering  method  can  employ  either  crisp 
sets  as  with  the  HCM  or  fuzzy  sets  as  with  the  FCM, 
RFCM  and  RCA. 

The  HCM  clustering  algorithm  is  described  in  [8,  p.55] 
and  with  the  Wishart  Measure  in  [1].  The  FCM  is  a 
practical  clustering  algorithm  that  generalizes  the 
HCM  by  replacing  the  class  assignment  with  a 
membership  vector  whose  elements  represent  the 
membership  of  the  data  point  in  each  of  the  c  distinct 
classes.  The  algorithm  produces  a  fuzzy  partition  of 
the  data  and  may  be  viewed  as  an  unsupervised 
learning  technique.  The  following  description  of  the 
FCM  is  based  on  [8]. 


3.1  Fuzzy  c-means 

Consider  N  data  samples  forming  the  data  set  denoted 
by  X  =  ,  where  each  sample  xf  e  Rp . 

Assume  that  there  are  C  classes  and  uik  = 
w/(**)€[0,l]  ls  the  membership  of  the  k-th  sample 
xk  in  the  i-th  class  v,-,  where  v=  (v1,v2,...,vc)  is 
the  set  of  exemplars  or  prototypes  and  U  =  [uik  ]  is  the 
membership  matrix.  Each  sample  point  xk  satisfies 

c 

the  constraint  that  ^  uik  =  1 .  The  FCM  algorithm 
/=! 

N  c 

minimizes  the  function  J(U,  v)  =  1'djk  where 

k= 1  i=l 

dik  -  ||v#.  ~~xk  ||2  subject  to  the  above  constraint.  The 
alternating  optimization  (AO)  method  is  one  technique 
to  minimize  J(Uyv).  The  power  mc  of  the 
membership  is  called  the  weighting  exponent.  A 
detailed  version  of  this  algorithm  is  given  in  [8,  p.66]. 
HCM  and  FCM  exemplars  are  linear  statistics  or 
weighted  averages  of  the  data  points  where  the  weights 
are  scaled  versions  of  the  memberships.  Unfortunately, 
linear  statistics  are  known  to  be  vulnerable  to  outliers 
[9].  HCM  may  also  be  viewed  as  a  special  case  of 
FCM  where  the  weighting  exponent  mc  is  1,  and  the 
data  sample  memberships  in  the  classes  are  either  0  or 
1.  Compared  to  the  FCM,  the  HCM  is  more  often 
trapped  in  local  minimum. 

3.2.  Robust  Fuzzy  c-Means  clustering 

To  robustify  the  algorithm,  a  softer  error  function 

replaces  the  dfk  term.  One  can  replace  dfk  with 
dik  =  ||v,-  -^11  and  then  the  resulting  algorithm  is 
called  the  fuzzy  c-medians  algorithm  [10].  Another 
alternative  is  to  replace  dfk  in  the  objective  functional 
with  Huber’s  p  function.  The  objective  function  is 

J(U> v)  =  XZ p{dik )  Where  dik  =  ||v,  - xk ||2 . 

JM  /=! 

The  p  function  applied  in  the  examples  of  section  4  is 

\\x2,  //|x|<l 

p(x )  =  <  whose  form  is  quadratic 

when  close  to  the  exemplar,  but  linear  when  far  from 
the  exemplar.  This  particular  p  function  is  the  one 
used  by  Huber  in  his  early  papers.  The  optimal 
memberships  are  then  given  by: 


^ik  V 


*>. 


The  exemplars  are  computed  by  the  weighted  mean 
N  IN 

given  by:  v,  =  ^  u"k  wlk  xik  j  ^  wik  ,  where  the 

*=!  /  *=1 


Huber  weights  wiJc  are  dependent  upon  dik  [11]. 
These  estimates  for  v,  are  W-estimators  or  robust 
recursive  estimators  because  the  weights  wik  are 
functions  of  vf .  The  weights  have  the  form 
m<jc)  =  y/(x)!x  where  y/{x)  =  p'(x) .  In  this  case 


fi,  M<i 

w(x)  =  ^  ,  which  has  the  effect  of  gradually 

[l/\xl\x\>l 

reducing  the  influence  of  the  outliers.  So,  the  exemplar 
v,-  is  a  weighted  combination  of  the  sample  values 
where  the  weights  depend  on  both  the  membership  of 
the  k-th  sample  in  the  i-th  cluster  Ujk  and  a  spatial 
Huber  weight  function  [9]. 


The  advantage  of  the  RFCM  clustering  algorithm  is  its 
resistance  . to  outliers,  but  at  the  expense  of  increased 
complexity  in  implementing  the  algorithm.  For 
example,  the  W-estimator  should  be  iterated  at  each 
stage  of  the  RFCM,  which  of  course  would  increase  its 
time  complexity  by  a  factor  proportional  to  the  number 
of  iterations.  Moreover,  a  scaling  constant  is  needed  in 
the  Huber  weight,  requiring  an  auxiliary  estimate  of 
dispersion.  Here  the  auxiliary  information  is  obtained 
from  a  robust  estimator,  the  median  absolute 
deviations  about  the  median  (MAD).  Since  the  RFCM 
is  non-linear  in  nature,  it  requires  a  better  initialization 
for  the  exemplars.  If  an  exemplar  is  too  far  from  any 
data  point,  the  membership  of  all  the  data  points  to  this 
exemplar  will  be  essentially  zero,  and  the  algorithm 
needs  to  deal  with  this  special  case  to  avoid  underflow 
or  overflow  problems.  In  this  paper,  this  problem  is 
avoided  by  using  Huber  weights,  which  have  infinite 
support  yet  vanishing  weight.  Finally,  the  most 
difficult  part  of  the  RFCM  is  that  one  must  specify  the 
number  of  clusters  in  advance,  something  that  is 
usually  not  known.  Determining  the  number  of  clusters 
is  called  the  Validity  problem.  The  RCA  algorithm 
applied  in  this  paper  attempts  to  retain  the  clustering 
behavior  of  the  RFCM  and  at  the  same  time  obtain  a 
reasonable  estimate  of  the  optimum  number  of 
clusters. 


The  RCA  minimizes  the  following  functional 
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where  the  second  term  is  a  penalty  function  imposed  to 
produce  a  parsimonious  number  of  cluster.  Note  the 
weighting  exponent  m  is  now  fixed  at  2,  which  was 
assumed  to  simplify  the  analysis.  The  parameter  a 
represents  the  tradeoff  between  the  first  two  competing 
terms  of  the  objective  function  J(Uyv) .  The  first  term 
is  minimum  when  there  are  N  clusters  and  the  second 
term  that  is  minimum  when  there  is  one  cluster.  Frigui 
[5]  allows  an  initial  period  of  agglomeration  and  then 
uses  an  exponential  fader  to  reduce  a  to  zero.  The  a 
and  fading  parameters  must  be  set  before  running  the 
program.  In  theory,  one  may  grossly  over-estimate  the 
number  of  clusters  and  the  RCA  will  converge  to  a 
"best”  number  of  clusters.  So  now  one  does  not  need  to 
specify  the  number  of  clusters  in  advance.  However, 
one  does  have  to  specify  three  other  parameters  in 
advance,  the  a  parameter,  the  fading  parameter,  and 
the  cardinality  threshold  where  one  drops  an  active 
cluster.  The  RCA  also  uses  the  AO  algorithm  with  the 
same  centering  statistics  for  the  exemplars  as  the 
RFCM  and  a  modified  update  to  the  memberships 
consisting  of  the  sum  of  the  RFCM  membership  plus  a 

bias:  uik  =  u%cm  +  J3jk  .  The  bias  term  is  given  by 


P,k  =  ~r -r (N'ik  ~Nik)  where 
P(dik) 


£  Nk 

N'ik  =  w,k  Z  uij  wij  =  wik N I  and  Nik  =  V  P<yd‘k )  • 

7=1  y  1 

m  P(dik) 

The  bias  term  may  be  thought  of  as  the  correction  to 
the  RFCM  for  penalizing  the  increase  in  the  number  of 
clusters. 


4.  Examples 

The  set  of  parameters  for  the  RCA  is  large  enough  to 
make  it  very  adaptable  to  different  image  environment, 
but  this  also  means  that  one  has  to  initialize  these 
parameters.  A  means  of  automatically  setting  them 
will  be  needed  if  this  algorithm  is  ever  fielded; 
however,  for  this  paper  one  can  only  discuss  the 
influence  of  each  of  these  parameters  and  present  some 
examples  of  these  images.  The  parameter  a 
determines  the  trade-off  between  the  within  cluster 
error  and  the  excess  cluster  number  penalty.  The 
setting  of  a  and  the  fading  parameter  provide  the 
trade-off  of  fidelity  and  computational  complexity. 
Fewer  clusters  reduce  the  time  and  space  complexity 
of  the  algorithm  at  the  expense  of  losing  the  finer 


granularity  in  objects  in  the  image.  At  each  iteration, 
the  ratio  of  the  first  term  to  the  second  is  computed  to 
maintain  the  same  relation  between  these  two 
competing  terms.  Then  a  =  rj  ■  ratio  •  exp(-index  /  r) 
where  index  is  the  iteration  number,  rj  is  the  constant 
that  controls  the  relative  importance  of  the  two  terms, 
exp(-/«dsx/r)  is  the  exponential  fader  and  r  is  a 

time  constant.  After  index! t>  3  the  effect  of  the 
second  term  has  essentially  faded  out. 

The  granularity  of  the  objects  that  can  be  resolved  is 
also  a  function  of  another  parameter,  which  discards  an 
active  cluster  once  it  falls  below  a  fuzzy  cardinality 
threshold.  Multiple  cluster  exemplars  sharing  a  given 
cluster  of  points  have  still  been  observed  where  both 
clusters  were  above  cardinality  threshold  and  therefore 
were  stable.  Also  influencing  the  number  of  clusters  is 
the  scale  parameter  used  in  the  Huber  p  function.  If 
not  chosen  too  small,  a  given  cluster  will  be  broken-up 
into  a  cluster  of  clusters,  which  can  be  either 
interpreted  as  a  modeling  plus  or  minus.  Discovering 
the  correct  set  of  parameters  for  a  given  set  of 
POLSAR  images  is  time-consuming  and  also  depends 
upon  what  information  one  tries  to  extract  from  the 
image.  These  parameters  maybe  differ  from  one 
collection  of  images  to  another.  Considerable  effort 
remains  to  be  invested  in  learning  these  tradeoffs. 

The  images  presented  in  this  section  apply  to  land-use 
management,  where  one  is  trying  to  classify  large 
physical  features  and  thereby  emphasizing  parsimony 
in  cluster  number  over  within  cluster  error.  Figure  1  is 
an  image  of  blueberry  fields,  where  one  is  interested  in 
classifying  harvested  and  non-bearing  blueberry  fields 
versus  bearing  blueberry  fields. 

Figure  1  is  a  341x341  pixel  POLSAR  span  image 
where  the  pixel  reflects  the  total  power  in  the  HH , 
HV ,  and  VV  returns  after  properly  censoring  and 
scaling.  The  image  consist  of  various  patches  of 
blueberry  fields  in  different  stages  of  development. 
The  lighter  fields  are  ready  to  harvest  and  the  darker 
fields  are  either  non-bearing  or  harvested.  Figure  2 
shows  the  image  constructed  from  6  clusters  derived 
from  the  RCA  algorithm,  with  an  initial  set  of  20 
clusters,  after  50  iterations.  Here  r  is  10,  rj  is  4,  the 
scale  factor  is  three  times  the  MAD  estimator  and  the 
minimum  cardinality  was  set  at  one  50th  of  the  number 
of  pixels  in  the  image  or  1162  pixels.  The  RCA 
reduces  the  number  of  clusters  by  a  factor  of  at  least 
three,  which  after  50  iterations  represents  five  time 


constants,  i.e.  index  =  5r  so  the  influence  of  the 
second  term  has  been  essentially  discounted. 


Figure!  POLSAR  image. 


Figure  2.  Results  of  the  RCA,  50  iterations. 

The  six  exemplars  generated  from  the  RCA  give  some 
insight  into  representation  of  the  image  features  as 
exemplars  and  into  the  capability  of  the  RCA  to  aid  in 
exploratory  data  analysis.  Figure  3  shows  the  plots  of 
the  six  exemplar  vectors,  although  one  immediately 
notices  that  there  only  appears  to  be  five.  The 
exemplars  are  ordered  in  increasing  energy  and  two 
most  energetic  exemplars  are  very  similar  so  they 
appear  to  be  the  same  on  this  scale.  What  is  nice  about 
this  concise  representation  of  this  image  is  that  one 
essentially  assign  exemplars  or  groups  of  exemplars  to 
specific  features  in  the  image.  The  lowest  energy 
exemplar  represents  the  ground,  the  second  exemplar 
represents  the  harvested  blueberries.  The  third  and 
fourth  largest  exemplars  represent  the  blueberries 


ready  for  harvest  and  finally  the  fifth  and  sixth 
exemplars  represent  groups  of  trees  or  larger  structure 
elements  in  the  blueberry  fields.  What  is  even  more 
revealing  in  this  plot  is  the  possibility  of  data 
reduction.  Since  only  the  1st,  2nd,  3rd,  6th  and  9th 
elements  of  the  feature  vector  seem  to  contain 
discriminatory  information,  one  can  drop  the 
dimensionality  of  the  representation  by  nearly  one-half 
with  its  accompanying  reduction  in  time-  and  space- 
complexity.. 


Figure  3.  Six  exemplars  for  this  image. 


Figure  4.  Filtered  version  of  this  image. 

Figure  4  contains  a  filtered  version  of  figure  2  where 
the  first  two  low  energy  clusters  have  been  mapped  to 
a  first  value,  third  and  fourth  clusters  have  been 
mapped  to  a  second  value  and  the  two  most  energetic 
clusters  have  been  mapped  to  a  third  value.  The 
resulting  image  tends  to  segments  the  yielding 


blueberry  field.  Although  the  segmentation  is  not 
totally  distinct,  it  must  be  remembered  that  nature  of 
the  growths  does  not  produce  the  distinct  boundaries 
characteristic  of  manmade  objects,  so  the  segmentation 
is  not  as  distinctive  as  objects  produced  on  an 
assembly  line. 

5.  Conclusions 

The  modified  robust  competitive  agglomeration  (RCA) 
clustering  algorithm  has  been  applied  to  segment 
POLSAR  images.  Ordering  the  clusters  by  received 
signal  energy,  one  can  further  segment  the  image 
structures  with  the  low  energy  clusters  representing 
low  growth  vegetation  and  the  high  energy  clusters 
representing  large  growth  structures  and  man-made 
objects.  Outliers  and  exceedingly  small  structures  are 
removed  from  the  images  by  setting  the  fuzzy 
cardinality  for  rejection.  The  RCA  also  is  a  good 
algorithm  for  exploratory  data  analysis  tool  because 
one  can  drive  the  clustering  to  emphasize  either 
fidelity  or  simplicity.  Simplicity  produces  a 
parsimonious  representation  and  a  more  efficient 
algorithm.  Fidelity  allows  the  user  to  adjust  the 
granularity  to  match  the  application.  In  either  case,  the 
RCA  suggests  which  data  vector  components  are 
contributing  to  the  discrimination  and  thus  which 
components  may  be  eliminated  to  reduce  the 
dimensionality  without  loss  of  discrimination. 
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